AITopics | high-dimensional asymptotic

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

Neural Information Processing SystemsDec-25-2025, 18:06:36 GMT

In the proportional asymptotic limit where $n,d,N\to\infty$ at the same rate, and an idealized student-teacher setting where the teacher $f^*$ is a single-index model, we compute the prediction risk of ridge regression on the conjugate kernel after one gradient step on $\boldsymbol{W}$ with learning rate $\eta$. We consider two scalings of the first step learning rate $\eta$. For small $\eta$, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random features model, but cannot defeat the best linear model on the input. Whereas for sufficiently large $\eta$, we prove that for certain $f^*$, the same ridge estimator on trained features can go beyond this ``linear regime'' and outperform a wide range of (fixed) kernels. Our results demonstrate that even one gradient step can lead to a considerable advantage over random features, and highlight the role of learning rate scaling in the initial phase of training.

boldsymbol, feature learning, high-dimensional asymptotic, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

High-dimensional Asymptotics of Denoising Autoencoders

Neural Information Processing SystemsDec-24-2025, 06:34:12 GMT

We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection. We consider the high-dimensional limit where the number of training samples and the input dimension jointly tend to infinity while the number of hidden units remains bounded. We provide closed-form expressions for the denoising mean-squared test error. Building on this result, we quantitatively characterize the advantage of the considered architecture over the autoencoder without the skip connection that relates closely to principal component analysis. We further show that our results capture accurately the learning curves on a range of real datasets.

denoising autoencoder, high-dimensional asymptotic, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

Neural Information Processing SystemsAug-13-2025, 00:01:15 GMT

In the proportional asymptotic limit where n,d,N\to\infty at the same rate, and an idealized student-teacher setting where the teacher f * is a single-index model, we compute the prediction risk of ridge regression on the conjugate kernel after one gradient step on \boldsymbol{W} with learning rate \eta . We consider two scalings of the first step learning rate \eta . For small \eta, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random features model, but cannot defeat the best linear model on the input. Whereas for sufficiently large \eta, we prove that for certain f *, the same ridge estimator on trained features can go beyond this linear regime'' and outperform a wide range of (fixed) kernels. Our results demonstrate that even one gradient step can lead to a considerable advantage over random features, and highlight the role of learning rate scaling in the initial phase of training.

artificial intelligence, boldsymbol, machine learning, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

Neural Information Processing SystemsJan-19-2025, 07:39:46 GMT

In the proportional asymptotic limit where n,d,N\to\infty at the same rate, and an idealized student-teacher setting where the teacher f * is a single-index model, we compute the prediction risk of ridge regression on the conjugate kernel after one gradient step on \boldsymbol{W} with learning rate \eta . We consider two scalings of the first step learning rate \eta . For small \eta, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random features model, but cannot defeat the best linear model on the input. Whereas for sufficiently large \eta, we prove that for certain f *, the same ridge estimator on trained features can go beyond this linear regime'' and outperform a wide range of (fixed) kernels. Our results demonstrate that even one gradient step can lead to a considerable advantage over random features, and highlight the role of learning rate scaling in the initial phase of training.

boldsymbol, feature learning, high-dimensional asymptotic, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

High-dimensional Asymptotics of Denoising Autoencoders

Neural Information Processing SystemsOct-10-2024, 14:11:12 GMT

We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection. We consider the high-dimensional limit where the number of training samples and the input dimension jointly tend to infinity while the number of hidden units remains bounded. We provide closed-form expressions for the denoising mean-squared test error. Building on this result, we quantitatively characterize the advantage of the considered architecture over the autoencoder without the skip connection that relates closely to principal component analysis. We further show that our results capture accurately the learning curves on a range of real datasets.

denoising autoencoder, high-dimensional asymptotic, skip connection

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.32)

Add feedback

Filters

Collaborating Authors

high-dimensional asymptotic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

High-dimensional Asymptotics of Denoising Autoencoders

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

High-dimensional Asymptotics of Denoising Autoencoders